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TIME-VARYING NONLINEAR REGRESSION MODELS: 
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This paper considers a general class of nonparametric time se¬ 
ries regression models where the regression function can be time- 
dependent. We establish an asymptotic theory for estimates of the 
time-varying regression functions. For this general class of models, an 
important issue in practice is to address the necessity of modeling the 
regression function as nonlinear and time-varying. To tackle this, we 
propose an information criterion and prove its selection consistency 
property. The results are applied to the U.S. Treasury interest rate 
data. 

1. Introduction. Consider the time-varying regression model 

(1.1) Model I: y, = mj(xj)-h e,, i = l,...,n, 

where yi, Xj and Cj are the responses, the predictors and the errors, re¬ 
spectively, and mj(-) =m{-,ifn) is a time-varying regression function. Here 
m : X [0,1] —)■ M is a smooth function, and i/n, t = 1,..., n, represents the 

time rescaled to the unit interval. Model I is very general. If mj(-) is not 
time-varying, then (1.1) becomes 

Model II: y* = y(xj)-|-e*, i = l,...,n. 

Model II has been extensively studied in the literature; see Robinson (1983), 
Gyorfi et al. (1989), Fan and Yao (2003) and Li and Racine (2007), among 
others. As an important example, (1.1) can be viewed as the discretized 
version of the nonstationary diffusion process 

(1.2) dyt = m{yt, t/T) dt + a{yt,t/T) dMt, 
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where {BsI^gr is a standard Brownian motion, m{-, •) and a{-, ■) are, respec¬ 
tively, the drift and the volatility functions, which can both be time-varying, 
and T is the time horizon under consideration. If the functions m(-,-) and 
do not depend on time, then ( 1 . 2 ) becomes the stationary diffusion 
process 

(1-3) dyt = + 


which relates to model II. There is a huge literature on modeling interest 
rates data by (1.3). For example, Vasicek (1977) considered model (1.3) 
with linear drift function /x(x) = /3o + j3ix and constant volatility 7(3:) = 7 , 
where /3o,/3i,7 are unknown parameters. Courtadon (1982), Cox, Ingersoll 
and Ross (1985) and Chan et al. (1992) considered nonconstant volatility 
functions. Ait-Sahalia (1996), Stanton (1997) and Liu and Wu (2010) stud¬ 
ied model (1.3) with nonlinear drift functions. See Zhao (2008) for a review. 
However, due to policy and societal changes, those models with static re¬ 
lationship between responses and predictors may not be suitable. Here we 
shall study estimates of time-varying regression function mi{-) for model 
( 1 . 1 ). 

For model H, let Ks{-) be a d-dimensional kernel function 


(i.4)i;(u) 



'^ViKs 

i=l 



/n(u) 


1 


i=l 



where be a bandwidth sequence. We can then apply the traditional 
Nadaraya-Watson estimate for the regression function /i(-), 


(1.5) 


An(u) 


fnjvi) 

fn{u) ’ 


u G 


If the process (x*) is stationary, then /„ is the kernel density estimate of its 
marginal density. For stationary processes, an asymptotic theory for these 
nonparametric estimators has been developed by many researchers, includ¬ 
ing Robinson (1983), Castellana and Leadbetter (1986), Silverman (1986), 
Gyorh et al. (1989), Yu (1993), Tjpstheim (1994), Wand and Jones (1995), 
Bosq (1996), Neumann (1998), Neumann and Kreiss (1998), Fan and Yao 
(2003) and Li and Racine (2007), among others. However, the case of non¬ 
stationary processes has been rarely touched. Hall, Muller and Wu (2006) 
considered the situation that the underlying distribution evolves with time 
and proposed a nonparametric time-dynamic density estimator. Assuming 
independence, they proved the consistency of their kernel-type estimators 
and applied the results to fast mode tracking. Following the spirit of Hall, 
Muller and Wu (2006), Vogt (2012) considered a kernel estimator of the 
time-varying regression model ( 1 . 1 ), and established its asymptotic normal¬ 
ity and uniform bound under the classical strong mixing conditions. In Sec¬ 
tions 3.1 and 3.2, we advance the nonparametric estimation theory for the 
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time-varying regression model (1.1) under the framework of Draghicescu, 
Guillas and Wu (2009), which is convenient to use and often leads to opti¬ 
mal asymptotic results. 

Apart from model II, model I contains another important special case: 
the time-varying coefficient linear regression model 

Model III: yi = 'xjf3^ +ei, i = l,...,n, 

where ^ is the transpose and f3^ = f3{i/n) for some smooth function (3 : [0,1] —>■ 
M'^. The traditional linear regression model 

Model IV: yi = :xj6 + Ci, i = l,...,n, 

where 0 G is the regression coefficient, is a special case of model III. Esti¬ 
mation of /3(-) has been considered by Hoover et al. (1998), Fan and Zhang 
(2000a, 2000b), Huang, Wu and Zhou (2004), Ramsay and Silverman (2005), 
Cai (2007) and Zhou and Wu (2010), among others. The problem of distin¬ 
guishing between models HI and IV has been studied in the literature mainly 
by means of hypothesis testings; see, for example, Chow (1960), Brown, 
Durbin and Evans (1975), Nabeya and Tanaka (1988), Leybourne and Mc¬ 
Cabe (1989), Nyblom (1989), Ploberger, Kramer and Kontrus (1989), An¬ 
drews (1993), Davis, Huang and Yao (1995), Lin and Terasvirta (1999) and 
He, Terasvirta and Conzalez (2009). On the other hand, model IV specifies 
a linear relationship upon model H, and there is a huge literature on testing 
parametric forms of see Azzalini and Bowman (1993), Conzalez Man- 
teiga and Cao (1993), Hardle and Mammen (1993), Zheng (1996), Dette 
(1999), Fan, Zhang and Zhang (2001), Zhang and Dette (2004) and Zhang 
and Wu (2011), among others. Nevertheless, model selection between models 
H and HI received much less attention. Note that both of them are nested 
in the general model I, and they all cover the linear regression model IV. It 
is desirable to develop a model selection criterion. An information criterion 
is proposed in Section 3.3, where its consistency property is obtained. 

The rest of the paper is organized as follows. Section 2 introduces the 
model setting. Main results are stated in Section 3 and are proved in Sec¬ 
tion 6 with some of the proofs postponed to the supplementary material 
[Zhang and Wu (2015)]. A simulation study is given in Section 4 along with 
an application to the U.S. Treasury interest rate data. 

2. Model setting. For estimation of model I, temporal dynamics should 
be taken into consideration. Let Kt{-) be a temporal kernel function (kernel 
function for time), be another sequence of bandwidths and = 

KT{{i/n — t)/bn}{S2{t) - {t- i/n)Si{t)}/{S2{t)So{t) - Si{t)} be the local 
linear weights, where Si{t) = - J /nYK tIU/ n - t)/bn}, I G {0,1,2}. 
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l.eiKs,h^{-)=h-<^Ks{-/hn), 

n 

i=l 

( 2 . 1 ) 

fn{\\,t) = '^yiKs,h^{\i-y^i)wb^^i{t), 
i=l 

we consider the time-varying kernel regression estimator 

(2.2) m„(u,t) = 

Hall, Miiller and Wu (2006) proved the uniform consistency of /„ in (2.1) 
by assuming that xi,...,x„ are independent. To allow nonstationary and 
dependent observations, we assume 

(2.3) yii = G{i/n]'Hi), where = (... 

and k gZ, are independent and identically distributed (i.i.d.) random 
vectors, and G is a measurable function such that G(t; Tii) is well defined for 
each t G [0,1]. Following Draghicescu, Guillas and Wu (2009), the framework 

(2.3) suggests locally strict stationarity and is convenient for asymptotic 
study. For the error process, we assume that 

(2.4) ei=ai{xi)r]i = a{xi,i/n)r]i, 

where cr(-, •): x [0,1] ^ M is a smooth function, and (jy*) is a sequence of 

random variables satisfying E{rii\xi) = 0 and E{r]‘f\:x.i) = 1. At the outset (cf. 
Sections 3.1-3.3) we assume that rj^^k gZ, are i.i.d. and independent of "Kj, 
j GZ. The latter assumption can be relaxed (though technically much more 
tedious) to allow models with correlated errors and nonlinear autoregressive 
processes; see Section 3.4. 

For a random vector Z, we write Z G T*?, g > 0 if ||Z|| = {i?(|Z|'J)}^/'? < oo 
where | • | is the Euclidean vector norm, and we denote || • || = || • ||2- Let 
Fi(u, t|'Kfc) = pr{G(t;'Kfc+i) < u|'Hfc} be the one-step ahead predictive or 
conditional distribution function and fi{u,t\'Hk ) = d‘^Fi{u,t\'Hk)/du be the 
corresponding conditional density. Let (^() be an i.i.d. copy of {^j) and 
'H'f, = (• ■ •,^- 1 ,• • • >^fc) Le the coupled shift process. We define the 
predictive dependence measure 

(2.5) V’fc,g= sup sup ||/i(u,t|^fc) -/i(u,t|^fc)l| . 

tG[0,l] ueR"* 

Quantity (2.5) measures the contribution of ^g, the innovation at step 0, 
on the conditional or predictive distribution at step k. We shall make the 
following assumptions: 
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(Al) smoothness (third order continuous differentiability): fjm, 

X [0,1]); 

(A2) short-range dependence; 1110,2 < oo, where ^m,q = EZm 
(A3) there exists a constant cq < oo such that almost surely, 

sup sup {/i(u,t|^o) + |5'^/l(u,t|^o)/5u|} < Cq. 

46 [0,1] u6K'^ 

Condition (A3) implies that the marginal density /(u,t) = E{fi{u,t\'Ho)} < 
Co- 


3. Main results. 


3.1. Nonparametric kernel estimation. Throughout the paper, we as¬ 
sume that the kernel functions Ks{-) and Kt{-) are both symmetric and 
twice continuously differentiable on their support [—1,1]'^ and [—1,1], re¬ 
spectively, and Ks{s) ds = Kt{v) dv = 1. Denote by “=i>” conver¬ 

gence in distribution. Theorem 3.1 provides the asymptotic normality of the 
time-varying kernel estimators (2.1) and (2.2), while Theorem 3.2 concerns 
the time-constant estimators (1.4) and (1.5). 


Theorem 3.1. Assume (Al)-(A3) and rn G O', p > 2 are i.i.d. Let 
(u, t) G X (0,1) he a fixed point. If 0, hn^O and nbnh'^ oo, then 

(3.1) (n6„/i^)^/^[/„(u,t) - E{fn{u,t)}] ^ N{0J{u,t)XKs^KT}, 

where Xkt = Et(v)^ dv and Xks = /[_i ijd A"s(s)^ ds. If in addition 
/(u, t) > 0, then 


(3.2) {nhnh'^f^'^ 


rhn{vi,t) 


E{fn{u,t)} '\ ( a{n,t)'^XKs>^KT ] 

E{fn{u,t)}\ I ’ /(u,t) /• 


Let IIf{u,t) = {d"^f{u,t)/duiduj}i<ij<d be the Hessian matrix of the 
density function / with respect to u. Denote f^^’‘^\u,t) = f{u,t)/dt^, and 
we use the same notation for the product function (m/)(u, t) = m(u, t)/(u, t). 
Then for any point (u,t) G x (0,1) with /(u, t) > 0, we have 

E{fniu,t)} = f{u,t) + Y^r{Hf{u,t)Ks} + y/(°’2)(u,t)4tr + 0{bl + hi), 
where tr(-) is the trace operator 
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and 

= m(u, t) + M {Hmf{u, t) - m(u, t)}Ks] 

+ (u> t) - m(u,(u, t)}KT 

+ 0{bn + hi). 

Hence (2.1) and (2.2) are consistent estimates of the local density func¬ 
tion / and the regression function m, respectively. The asymptotic mean 
squared error (AMSE) optimal bandwidths satisfy bn ^ and hn 

Here for positive sequences (sn) and (r„), we write Sn^Tn if 
Snl'<'n + T'nlSn is bounded for all large n. 


Theorem 3.2. Assume (Al)~(A3) and r]i € C^, p> 2. If hn ^ 0 and 
nhl —>■ oo, then 

(3.3) {nhlf^[fnin)-E{fnin)}]^N{Oj{u)XKs}, u € 
where /(u) = f{u,t)dt. If in addition /(u) > 0, then 


(3.4) 


(nh. 


dA/2 


An(u) - 


E{fn{n)} 


iV{0,H(u)Ai^,}, 


Eifnin)}, 

where, letting m(u) = m{u,t)f{u.,t) dt/f{u), the variance function 

V (u) = /(u)"2 f [{m(u, t) - m(u)}^ -h <t(u, t)^]/(u, t) dt. 
io 


For any point u G with /(u) > 0, we have 

E{fn{u)} = /(u) -h Ytr| j Hf{u, t)Ks dt^ + 0{hl) 

and 

+ T^tr f {Hmf{u,t) -m{u,t)Hf{u,t)}Ksdt 
E{fn{u)} 2/(u) Lio J 

+ 0{hl). 

Therefore, (1.4) and (1.5) provide consistent estimators of / and fh, 
(weighted) temporal averages of the local density function / and the re¬ 
gression function m, respectively. For stationary processes, Theorem 3.2 
relates to traditional results on nonparametric kernel estimators; see, for 
example, Robinson (1983), Bosq (1996) and Wu (2005). The AMSF optimal 
bandwidth for the time-constant kernel estimators (1.4) and (1.5) satisfies 
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3.2. Uniform bounds. For stationary or independent observations, uni¬ 
form bounds for kernel estimators have been obtained by Peligrad (1992), 
Andrews (1995), Bosq (1996), Masry (1996), Fan and Yao (2003) and Hansen 
(2008), among others. Hall, Miiller and Wu (2006) obtained a uniform bound 
for time-varying kernel density estimators for independent observations, 
while Vogt (2012) considered kernel regression estimators under strong mix¬ 
ing conditions. We shall here establish uniform bounds for the time-varying 
kernel estimators (2.1) and (2.2) under the locally strict stationarity frame¬ 
work (2.3). We need the following assumptions: 

(A4) there exists a q > 2 such that < oo and = 0{m~°^) for 
some a > 1/2 — 1/g; 

(A5) let ^ C be a compact set, and assume inftg[o,i] infug^ /(u, t) > 

0 . 


Theorem 3.3. Assume (Al), (A3)-(A5), bn^O, /in — ^ 0 and n6n/in ^ 
oo. (i) If there exists r > r' > 0 such that sup^gjQ j^] ||G(t;'Ko)||r- < oo and 

^ 0, then 


sup sup |/n(u,t) - .F{/n(u,t)}| = Op 

tG[ 0 ,l] ueK'^ 


(logn)^/^ 1 
(nbnhi)^/^}' 


(ii) If rji G O’ for some p>2, and %n > 0, then 


sup sup 
IG[0,1] u&X 


mn(u,t) 


£'{fn(u,t)} 

^{/n(u,t)} 


J (logn)^/^ n i/piog n 1 
^\{nhnhiY/‘^^ nbnhf^ }' 


If the bandwidths bn ^ and hn have the optimal 

AMSE rate, and r]i G O for some p > {d + 5)/2, then the bound in The¬ 
orem 3.3(ii) can be simplified to Op{{nbnhfi^)~^/'^(logn)^^"^}. Theorem 3.4 
provides a uniform bound for (1.4) and (1.5). 


Theorem 3.4. Assume (Al), (A3)-(A5), hn ^0 and nh'^ —>■ oo. (i) If 
there exists r > r' > 0 such t/iat supjg[o,i] l|G(t;'Ho)||r < oo andn^/’’ +2+'^-'? x 
j^<^d+q) then 


sup |/n(u) - F;{/n(u)}| = Op 


(logn)^/^ 1 
(n/i^)V2 J- 


(ii) If rji G O for some p> 2, and —)• 0, then 


sup 


/ln(u) - 


E{fn{u)} 


E{fn{^)} 


= 0 


( (logn)^/^ ^ n^/^logn 


+ ■ 


nhi 
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If the bandwidth x n is AMSE-optimal, and r]i G for some 

p > {d + 4)/2, then the bound in Theorem 3.4(ii) can be simplified to 


Opiinhi) (log1/2}^ 


3.3. Model selection. Model I is quite general in the sense that it does not 
impose any specific parametric form on the regression function and allows it 
to change over time. However, in practice it is useful to check whether model 
I can be reduced to its simpler special cases, namely models II-IV. Model 
selection between models II and IV, or between models III and IV, has been 
studied in the literature mainly by means of hypothesis testing; see references 
in Section I. Nevertheless, less attention has been paid to distinguishing 
between models II and III. We shall here propose an information criterion 
that can consistently select the underlying true model among candidate 
models IMV. Let 3^ C (0,1) be a compact set and = {i = 1, ■. ■ ,n|i/n G 
We consider the restricted residual sum of squares for model I, which 
takes the form 


RSSn(=^,=^,I) = ^ {?/* 


where Ipj is the indicator function. Similarly, we can define RSSn(=^, =^,11), 
RSSn(=^, ^,111) and RSSri(,^, =;7,IV) for models II-IV, respectively. For the 
simple linear regression model IV, the parameter 6 can be estimated by the 
least squares estimate 


(3.5) 


n 


■E- 

i=l 


X,;X, 


T 



For the time-varying coefficient model III, let Kx^bni') = ^n^^Ti'/bn), and 
we can use the kernel estimator of Priestley and Chao (1972), 

(3.6)9„(t) = |i^Xix7A'T,fe„(i/n-t)| - t) 

G i=l J I ^ i=l 


For a candidate model g G {I, II, III, IV}, we define the generalized informa¬ 
tion criterion 


(3.7) GIC^,^(£») = log{RSS„( JT, Q)/n} + r„DF(£»), 

where is a tuning parameter indicating the amount of penalization and 
df(^)) represents the model complexity for model g^ {I, II, III, IV} deter¬ 
mined as follows. For the simple linear regression model IV, following the 
convention we set the model complexity or degree of freedom to be the num¬ 
ber of potential predictors, namely df(IV) = d. For the time-varying coeffi¬ 
cient model III, the effective number of parameters used in kernel smoothing 


TIME-VARYING NONLINEAR REGRESSION MODELS 


9 


is b~^ for each one of the d predictors [see, e.g., Hurvich, Simonoff and Tsai 
(1998)], and thus we set df(III) = b~^ d. Let iQRfc, A: = 1,..., d, be the compo¬ 
nentwise interquartile ranges of (xj), and motivated by the same spirit as in 
Hurvich, Simonoff and Tsai (1998), we set df(II) = (h(()“^ 
df(I) = {bnhn)~^ nfc=i(2iQRfc)) where 2iqr = 1 for random variables having 
a uniform distribution on [0,1]. The final model is selected by minimizing 
the information criterion (3.7). We shall make the following assumption: 

(A6) eigenvalues of M(G,t) = E{G{t]'Ho)G{t;'Ho)~^} are bounded away 
from zero and infinity on [0,1]. 

In order to establish the selection consistency of (3.7), in addition to the 
results developed in Sections 3.1 and 3.2 regarding models I and II, we need 
the following conditions on estimators (3.5) and (3.6) for models IV and III, 
respectively: 

(PI) There exists a nonrandom sequence dn such that On — On = Op{n 
If model IV is correctly specified, then On can be replaced by the true value 

Oo- 

(P2) There exists a sequence of nonrandom functions f3n ■ [0,1] ^ such 
that 


SUplPnit) - Pnii)\ =Op{4>n), 


where (jin = (nbn) ^/^(logn)^/^ -|- 6^. If model III is correctly specified, then 
/3„(-) can be replaced by the true coefficient function Pq{-) and 


sup 


M(G,t) 


Pnit)-Poit) 


KTbl(3o{t) 

2 


1 "■ 

- Y] XieiKrfin (*/ n-t) 


2=1 


= Op{4>: 




where XjCi G C^, i = 1,... ,n. 


Remark 3.1. Conditions (PI) and (P2) can be verified for locally sta¬ 
tionary processes with short-range dependence. For example, for the lin¬ 
ear regression model IV, by Lemma 5.1 of Zhang and Wu (2012), we have 
- E{^iY )} = Op(n^/2) and EILiIxilh - £^(xi2/i)} = OpY^^). 
Hence we can use 

which equals to Oq if yi = -xJE i = 1,... ,n. This verifies condition 
(PI). For the time-varying coefficient model HI, by Lemma 5.3 of Zhang 
and Wu (2012), we have sup^g^y |n“^ X^ILiixjX^ - E{^iY)}^T,bn{i/n — 
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t)\ =Op{(j)n) and |n ^i^iyi)}^T,br,{i/n-t)\ =Op(0„). 

Hence we can use 

Pnit) = \^'^^E{^iyi)KTfiAi/n-t) 

and condition (P2) follows by the proof of Theorem 3 in Zhou and Wu 

( 2010 ). 

Recall that the AMSE optimal bandwidths satisfy 6^(1) x and 

hnij) for model I, /in(II) for model II and 6n(III) x 

for model III. Theorem 3.5 provides the selection consistency of the 
information criterion (3.7), where the true model is denoted by ^o- 

Theorem 3.5. Assume (Al), (A3)"(A6) withq> {3d+5)/{d+2), (PI), 
(P2), r]i G for some p > {d + 5)/2, i = 1,... ,n, and bandwidths with opti¬ 
mal AMSE rates are used for models I-III. If 

0 , oo, 

then for any qi G {I, II, III, IV} and Qo, we have 

pr{GiC^,jr(^o) < GICs:,s{qi)} -t 1- 

3.4. Extensions. Recall that in Theorems 3.1-3.5 error process (2.4) has 
i.i.d. Pi, which are also independent of (xj). In Section 3.4.1 we allow serially 
correlated pi. Section 3.4.2 concerns time-varying autoregressive processes 
in which (pi) and (xj) are naturally dependent. 

3.4.1. Models with serially eorrelated errors. To allow errors with serial 
correlation, similarly to (2.3) we assume that 

(3.8) pi = L{i/n-,Ji), 

where Ji = (..., Ci-i, Ci) with Cfc, A; G Z, being i.i.d. random variables and 
independent of j G Z. Therefore, {pi) is a dependent nonstationary pro¬ 
cess that is independent of (xj), and the error process e* = a{xi,i/n)pi can 
exhibit both serial correlation and heteroscedasticity; see Robinson (1983), 
Orbe, Ferreira and Rodriguez-Poo (2005, 2006) and references therein for 
similar error structures. Let CiXjAd € Z, be i.i.d. and Jl. = {... ,C_i,Co,Ci) 
... ,Cfe)- Assume = sup^gjQ ||L(t; 77o)||q < oo, and define the functional 
dependence measure 

Ok,q= sup \\L{t-Jk)-L{t;Jl)\\ 
ie[o,i] 
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The following theorem states that the results presented in Sections 3.1-3.3 
will continue to hold (except for a difference of log n on the uniform bounds) 
if the process (rji) in (3.8) satishes the geometric moment contraction (GMC) 
condition [Shao and Wu (2007)]. The proof is available in the supplementary 
material [Zhang and Wu (2015)]. 

Theorem 3.6. Assume that the process {rji) in (3.8) satisfies = 
0{p^) for some 0 < p <1. Then the results of Theorems 3.1-3.5 will continue 
to hold except that the uniform bounds in Theorems 3.3i\\) and 3.4{fi) will 
be multiplied by a factor o/logn. 


3.4.2. Time-varying nonlinear autoregressive models. In this section we 
shall consider the autoregressive version of ( 1 . 1 ), 


(3.9) 


yi = m(Ki,i/n) -\-a{^i,i/n)r]i, 

Xj = ... ,yi_dV,i = ■ ■ ■ ,n, 


where pi are i.i.d. random variables with E{pi) = 0 and E(pf) = 1. We can 
view (3.9) as a time-varying or locally stationary autoregressive process, 
and the corresponding shift processes Fk = {... ,pk-iFk) and Hk = Fk-i. 
We shall here present analogous versions of Theorems 3.1-3.5. Note that in 
this case Xj cannot be written in the form of (2.3). However, Proposition 3.1 
implies that it can be well approximated by a process in the form of (2.3). 
For each t G [0,1], we define the process {yi{t)}i£z by 


(3.10) 


yfit) = m{xj(t),t}-Lcr{xi(t),t}ryi, 

Mt) = ■ 


Lemma 3.1. Assume that there exist constants ai,...,ad > 0 with 
Yl'j=i such that, for all x = (xi,... ,Xd)~^ and x' = {x'l,... ,x'^)~^, 


(3.11) 


sup 11 [m(x, t) + cr(x, t)pi] - [m(x', t) + cr(x', t)pi]\\ 
0<t<l 

d 

<'^aj\xj - x'j\. 
i=i 


Then (i) the recursion (3.10) has a stationary solution of the form yfit) = 
g{t]Fi) which satisfies the geometric moment contraction (GMC) property: 
for some p G (0,1), 

sup 5i{t) = 0{p^), 5i{t) = \\g{t-,Fi) - g{t]F'fi)\\ . 

0 <t<l 
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(ii) If in (3.9) the initial values = xi(0), then yi can he 

written in the form gi{Fi), where gi{-) is a measurable function, and it also 
satisfies the GMC property 

(3.12) supllyi -gi{..., gi-k- 2 , Vi-k-uVi-kFi-k+i, ■■■, r]i)\\p = 

i<n 

Lemma 3.1(i) concerns the stationarity of the process {yi{t)}i£z, which 
follows from Theorem 5.1 of Shao and Wu (2007). For (ii), denote by 
the left-hand side of (3.12). Then by (3.11), 9^ satishes 0^ < 
implying (3.12) via recursion. 

For presentational simplicity suppose we observe yi-d,y 2 -d, ■ ■ ■ Fn from 
model (3.9) with the initial values {yo,y-i,... ,yi-d) =xi(0). Estimates (2.1) 
and (2.2) can be computed in the same way. Proposition 3.1 implies that, for 
i such that ifn^u, the process (xj)j can be approximated by the stationary 
process {xj(n)}j, thus suggesting local strictly stationarity. The proof is 
available in the supplementary material [Zhang and Wu (2015)]. 

Proposition 3.1. Let Gp{'x.,t) = m{'x.,t) + a{-x.,t)r] and Gri(^,t) = 
dGrj{x,t)/dt. Assume (3.11) and 

sup sup ||G,,Jxi(u),t}|| < oo. 

0<t<10<u<l 

Then ||xj — Xj(n)||p = 0{n~^ -|- |n — i/n\). 

Let f{u,t) be the density of Xj(t) = {?/j_i(t),..., yj_rf(t)} and be the 
density of pi. Theorem 3.7 serves as an analogous version of Theorems 3.1- 
3.4, and the proof is available in the supplementary material [Zhang and Wu 
(2015)]. 

Theorem 3.7. Assume (Al), (A5) andsup^{/^(r(;)-|-|/))(tt))|} < oo. Let 
the conditions in Lemma 3.1 and Proposition 3.1 be satisfied. Then under re¬ 
spective conditions in Theorems 3.1-3.5, the corresponding conclusions also 
hold, respectively. 

4. Numerical implementation. 

4.1. Bandwidth and tuning parameter selection. Selecting bandwidths 
that optimize the performance of (3.7) can be quite nontrivial, and in our 
case, it is further complicated by the presence of dependence and nonstation- 
arity. Assuming independence, the problem of bandwidth selection has been 
considered for model II by Hardle and Marron (1985), Hardle, Hall and Mar- 
ron (1988), Park and Marron (1990), Ruppert, Sheather and Wand (1995), 
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Wand and Jones (1995), Xia (1998) and Gao and Gijbels (2008), among 
others. Hoover et al. (1998), Fan and Zhang (2000a) and Ramsay and Sil¬ 
verman (2005) considered the problem for model III for longitudinal data, 
where multiple independent realizations are available. For the time-varying 
kernel density estimator (2.1) with independent observations, Hall, Muller 
and Wu (2006) coupled the selection of spatial and temporal bandwidths 
and adopted the least squares cross validation [Silverman (1986)]. Never¬ 
theless, bandwidths selectors derived under independence can break down 
for dependent data [Wang (1998) and Opsomer, Wang and Yang (2001)]. 
We propose using the AMSE optimal bandwidths 6 „(I) = and 

h„(I) = for model I, /in(II) = c/i(H)n“^/('^+'^) for model H and 

6 „(HI) = C 6 (HI)n“^/^ for model III, where 0 < Cb(I),c/i(I), c/i(H),Cb(HI) < oo 
are constants. Due to the presence of both dependence and nonstationar- 
ity, estimation of these constants is difficult. Throughout this section, as a 
rule of thumb, we use Cb(I) = Cb(HI) = 1/2 and c/i(I) = c/i(H) = nfc=iiQ^fc- 
Our numerical examples suggest that these simple choices have a reasonably 
good performance. 

We shall here discuss the choice of the tuning parameter that con¬ 
trols the amount of penalization on models complexities. The problem has 
been extensively studied for the linear model IV by Akaike (1973), Mallows 
(1973), Schwarz (1978), Shao (1997) and Yang (2005) among others. For 
the generalized information criterion (3.7), given conditions in Theorem 3.5, 
one can choose logn, where c > 0 is a constant, which 

satisfies all the required conditions and thus guarantees the selection con¬ 
sistency. Note that the choice of c does not affect the asymptotic result, 
namely the proposed method will select the true model for any given c > 0 
as long as the sample size is large enough; see Theorem 3.5. Therefore, one 
can simply use c = 1 to devise a consistent model selection procedure. As 
an alternative, following Fan and Li (2001) and Tibshirani and Tibshirani 
(2009), we shall here consider a data-driven selector based on the iL-fold 
cross-validation (CV). In particular, we first split the data into K parts, 
denoted by T>i,..., Vk, then for each fe = 1,... , AT, we remove the fcth part 
from the data and use the information criterion (3.7) to select the model, 
based on which predictions can be made for the removed part and are de¬ 
noted by yL^(c), i G Dfc. The selected value c is obtained by minimizing the 
cross-validation criterion 

cv(c) = YEb<-W)L- 

k=l 

It can be seen from the simulation results in Section 4.2 that this GV-based 
tuning parameter selector performs reasonably well. 
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4.2. Simulation results. We shall in this section carry out a simulation 
study to examine the finite-sample performance of the generalized informa¬ 
tion criterion (3.7). Let d = 1 and i G Z and rjj, j G Z be i.i.d. stan¬ 
dard normal, a{t) = {t — 1/2)^, t G [0,1] and G{t; Tik) = Cfc + o-itYCk-i, 
/c G Z, t G [0,1]. For the regressor and error processes with Xi = G{i/n;T-Li) 
and Bi = a{xi,i/n)r]i, i = 1,... ,n, we consider model (1.1) with the following 
four specifications: 

(a) m(x,t) = 2.5sin(27rt) cos(7rx) and cr(x, t) = (^|ta:|/2; 

(b) m(x, t) = exp(x) and cr(x, t) = exp(x/3); 

(c) m{x, t) = 5t + 4cos(27rt)x and a{x, t) = (/?exp(tx/2); 

(d) m(a:, t) = 2-\-2>x and a{x, t) = if\x/3 + t|, 

where y? > 0 is a constant indicating the noise level. Cases (a)-(d) corre¬ 
spond to models I-IV, respectively, and their signal-to-noise ratios (SNRs) 
are roughly of the same order given the same ip. The Epanechnikov kernel 
K{v) = 3(1 — u^)/4, V G [—1,1], is used hereafter for both the spatial and 
temporal kernel functions. Let = [—2,2] and iT = [0.2,0.8]. The tuning 
parameter is selected by using the tenfold CV-based method described in 
Section 4.1. The results are summarized in Table 1 for different noise levels 
(/? G {1,2,3} and sample sizes n = 2^ x 250, 0 < A: < 3. For each configuration, 
the results are based on 1000 simulated realizations of models (a)-(d). 

It can be seen from Table 1 that the proposed model selection proce¬ 
dure performs reasonably well as it has very high empirical probabilities of 
identifying the true model, even when the sample size is moderate to small. 
For example, if the sample size n = 250, which is usually considered to be 
small for conducting time-varying nonparametric inference, and the data 
are generated by model (a) with ip = 1, then 967 out of 1000 realizations 
are correctly identified as the time-varying nonparametric regression model 
I, while 33 out of 1000 realizations are under-fitted as the simple linear re¬ 
gression model IV. Hence, for each combination of n and (p, in the ideal 
case, we expect the block to have unit diagonal components and zero off- 
diagonal components. For each configuration, medians of the SNR are also 
reported, where for each realization yi = mi{xi) + ei, z = 1,... ,n, the SNR 
is defined as {XlILiSILiproposed 
model selection procedure with the CV-based tuning parameter selector has 
a reasonably robust performance with respect to the noise level, and the 
performance improves quickly if we increase the sample size. Note that a 
sample size of 1000 is considered to be reasonable if one would like to con¬ 
duct time-varying nonparametric inference. 

4.3. Application on modeling interest rates. Modeling interest rates is 
an important problem in finance. In Black and Scholes (1973) and Merton 
(1974) interest rates were assumed to be constants. A popular model is the 


Table 1 

Proportions of selecting models I-IV for different combinations of noise levels ip, sample sizes n and model specifications (a)-(d) with 
1000 replications for each configuration. Medians of the SNR are also reported, where for each realization yi = mi{xi) + d, i = l,... ,n, 

the SNR is defined as 

Lp = 1 ip = 2 p = 3 

Selected model Selected model Selected model 


n 

Case 

SNR 

I 

II 

III 

IV 

SNR 

I 

II 

III 

IV 

SNR 

I 

II 

III 

IV 

250 

(a) 

4.36 

0.967 

0.000 

0.000 

0.033 

2.16 

0.920 

0.000 

0.000 

0.080 

1.45 

0.840 

0.000 

0.000 

0.160 


( b ) 

4.09 

0.116 

0.882 

0.000 

0.002 

2.04 

0.119 

0.857 

0.000 

0.024 

1.36 

0.132 

0.784 

0.002 

0.082 


( c ) 

3.73 

0.016 

0.000 

0.984 

0.000 

1.86 

0.032 

0.000 

0.968 

0.000 

1.24 

0.032 

0.000 

0.968 

0.000 


( d ) 

5.44 

0.017 

0.043 

0.005 

0.935 

2.72 

0.014 

0.040 

0.001 

0.945 

1.82 

0.024 

0.040 

0.003 

0.933 

500 

(a) 

4.29 

0.985 

0.000 

0.000 

0.015 

2.15 

0.945 

0.000 

0.000 

0.055 

1.44 

0.896 

0.000 

0.000 

0.104 


( b ) 

4.17 

0.044 

0.949 

0.000 

0.008 

2.08 

0.058 

0.906 

0.000 

0.036 

1.40 

0.037 

0.926 

0.000 

0.037 


( c ) 

3.71 

0.001 

0.000 

0.999 

0.000 

1.86 

0.008 

0.000 

0.992 

0.000 

1.24 

0.012 

0.000 

0.988 

0.000 


( d ) 

5.42 

0.007 

0.037 

0.000 

0.956 

2.71 

0.012 

0.042 

0.001 

0.945 

1.81 

0.005 

0.026 

0.006 

0.963 

1000 

(a) 

4.29 

0.994 

0.000 

0.000 

0.006 

2.15 

0.970 

0.000 

0.000 

0.030 

1.44 

0.921 

0.000 

0.000 

0.079 


( b ) 

4.17 

0.004 

0.992 

0.000 

0.004 

2.08 

0.005 

0.975 

0.000 

0.020 

1.40 

0.015 

0.957 

0.000 

0.028 


( c ) 

3.71 

0.000 

0.000 

1.000 

0.000 

1.86 

0.001 

0.000 

0.999 

0.000 

1.24 

0.004 

0.000 

0.996 

0.000 


( d ) 

5.42 

0.001 

0.028 

0.002 

0.969 

2.71 

0.002 

0.024 

0.003 

0.971 

1.81 

0.001 

0.025 

0.002 

0.972 

2000 

(a) 

4.29 

0.999 

0.000 

0.000 

0.001 

2.15 

0.979 

0.000 

0.000 

0.021 

1.44 

0.948 

0.000 

0.000 

0.052 


( b ) 

4.17 

0.000 

0.997 

0.000 

0.003 

2.08 

0.000 

0.982 

0.000 

0.018 

1.40 

0.000 

0.965 

0.000 

0.035 


( c ) 

3.71 

0.000 

0.000 

1.000 

0.000 

1.86 

0.000 

0.000 

1.000 

0.000 

1.24 

0.001 

0.000 

0.999 

0.000 


( d ) 

5.42 

0.000 

0.014 

0.001 

0.985 

2.71 

0.000 

0.014 

0.001 

0.985 

1.81 

0.000 

0.014 

0.000 

0.986 
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Time 

Fig. 1. Time series plots for the U.S. daily treasury yield curve rates with six-month 
(solid black) and two-year (dashed grey) maturities. 


time-homogeneous diffusion process (1.3) with linear drift function; see, for 
example, Vasicek (1977), Courtadon (1982), Cox, Ingersoll and Ross (1985) 
and Chan et al. (1992). Its discretized version is given by model IV. Ai't- 
Sahalia (1996), Stanton (1997) and Liu and Wu (2010) considered model 
(1.3) with nonlinear drift function, which relates to model II. We consider the 
daily U.S. treasury yield curve rates with six-month and two-year maturities 
during 01/02/1990-12/31/2010. The data can be obtained from the U.S. 
Department of the Treasury website at http://www.treasury.gov/. Both 
series contain n = 5256 daily rates, and their time series plots are shown in 
Figure 1. 

We shall here model the data by the time-varying diffusion process (1.2), 
and apply the proposed model selection procedure to determine the forms of 
the drift functions. Let Xi = rt^ be the observation at day i. Since a year has 
250 transaction days, A = tj — tj_i = 1/250. Following Liu and Wu (2010), 
we consider the following discretized version of (1.2): 


(4.1) 


yi = rti+i -rti = fi{xi,i/n)A + a{xi,i/n)A^/'^r]i 


where rji 


H+l 


AV2 


Note that rji are i.i.d. iV{0,1} random variables. We shall here write 
/r(xj,l/n)A and a{xi,i/n)A^/^ in (4.1) as m{xi,i/n) and a{xi,ijn) in the 
sequel. Then specifications of Vasicek (1977) and Liu and Wu (2010) become 
models IV and II, respectively. 

For the treasury yield curve rates with six-month maturity, let = 
[0.2,0.8], and = [0.18,7.89] which includes 95.5% of the daily rates x*. The 
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Table 2 

Results of the model seleetion procedure based on the generalized information criterion 
(3.7) for treasury yield rates with six-month and two-year maturity periods 


Six-month maturity Two-year maturity 


Model 

log(rss/n) 

df 

gic 

log(rss/u) 

df 

gic 

I 

-6.853 

69.54 

-6.790 

-6.126 

69.54 

-6.063 

II 

-6.824 

11.10 

-6.814 

-6.114 

11.10 

-6.104 

III 

-6.851 

22.19 

-6.831 

-6.126 

22.19 

-6.106 

IV 

-6.822 

2.00 

-6.820 

-6.113 

2.00 

-6.111 


selected bandwidths and tuning parameter are 6ri(I) = 0.12, hn{l) = 0.82, 
hn{ll) = 0.62, 6„(III) = 0.09 and f„ = 0.00090. The results are summarized 
in Table 2. Hence, the time-varying coefficient model III is selected, and we 
conclude that the treasury yield curve rates with six-month maturity should 
be modeled by (1.2) with -|- I3i{t)rt for some smoothly vary¬ 

ing functions /3o(-) and /3i(-), which serves as a time-varying version of Chan 
et al. (1992). 

We then consider the treasury yield curve rates with two-year maturity. 
Let = [0.2,0.8] and = [0.67,8.16] which includes 95.1% of the daily 
rates Xj. The selected bandwidths and tuning parameter are 6„(I) = 0.12, 
hn{l) = 0.75, h„(II) = 0.56, 6n(III) = 0.09 and fn = 0.00090. Based on Ta¬ 
ble 2, the linear regression model IV is selected. In comparison with the 
results with six-month maturity, our analysis suggests that treasury yield 
rates with longer maturity are more stable over time. 

5. Conclusion. The paper considers a time-varying nonparametric re¬ 
gression model, namely model I, which is able to capture time-varying and 
nonlinear relationships between the response variable and the explanatory 
variables. It includes the popular nonparametric regression model II and 
time-varying coefficient model III as special cases, and all of them are gen¬ 
eralizations of the simple linear regression model IV. In comparison with 
existing results, the current paper makes two major contributions. First, 
we develop an asymptotic theory on nonparametric estimation of the time- 
varying regression model (I.l) under the new framework of Draghicescu, 
Guillas and Wu (2009). Compared with the classical strong mixing condi¬ 
tions as used by Vogt (2012), the current framework is convenient to work 
with and often leads to optimal asymptotic results. In the proof, we use both 
the martingale decomposition and the m-dependence approximation tech¬ 
niques to obtain sharp results. Second, although the time-varying regression 
model I is quite general by allowing a time-varying nonlinear relationship 
between the response variable and the explanatory variables, it can be useful 
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in practice to check whether it can be reduced to its simpler special cases, 
namely models II-IV which have been extensively used in the literature. 
However, existing results on model selection usually focused on distinguish¬ 
ing between models II and IV and between models III and IV, and much 
less attention has been paid to distinguishing between models II and III. 
Note that models II and III are both generalizations of the simple linear 
regression model IV but in completely different aspects, and therefore it is 
desirable if we can have a statistically valid method to decide which gener¬ 
alization (or the more general model I) should be used for a given data set. 
The current paper hlls this gap by proposing an information criterion (3.7) 
in Section 3.3, which can be used to select the true model among candi¬ 
date models I-IV and its selection consistency is provided by Theorem 3.5. 
Therefore, the current paper sheds new light on distinguishing between non¬ 
linear and nonstationary generalizations of simple linear regression models, 
and the results are applied to find appropriate models for short-term and 
long-term interest rates. 


6. Technical proofs. We shall in this section provide technical proofs for 
Theorems 3.1-3.5. Because of the time-varying feature and nonstationar- 
ity, the proofs are much more involved than existing ones for stationary 
processes. We shall here use techniques of martingale approximation and 
m-dependent approximation. Let e* = {$J and = (..., ej_i, e*) be 
the corresponding shift process. We define the projection operator 

Vk-= E{-\J^k) - keZ. 

Throughout this section, C > 0 denotes a constant whose value may vary 
from place to place. Let aj^„(u,t), i = 1,..., n, be a triangular array of de¬ 
terministic nonnegative weight functions, (u, t) G x [0,1]. Lemma 6.1 pro¬ 
vides a bound for the quantity 

n 

(5„(u,t) = ^{/i(u,i/n|:Fj_i) -/(u,i/n)}ai,„(u,t), 
i=l 

and is useful for proving Theorems 3.1-3.4. 


Lemma 6.1. Let H„(u,t) = maxi<j<„ |aj^„(u,t)| and define H„(u,t) = 


Proof. Since ‘PkQa{'^,t), k a'L form a sequence of martingale differ¬ 
ences, we have 


ll<3a(u,*)l|^ 


E 


n 

^ ^fc{/i(u, i/n\Ti-i)}ai^n{'a., t) 
i=l 


2 
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< 


E 


n 

E 




2 


k=—oo V *=1 ) 

and the result follows by observing that < 

Aniu,t)^o,2 and I]fcGZ^*-fc-i,2|ai,n(u,t)| <nAn{u,t)'i/o,2- □ 


Lemma 6.2. Assume (A1)-(A3) and rji G C^, p > 2, i = 1,... ,n. (i) If 
bn ^ 0, hn —)> 0 and nbnhn oo, then for any (u,t) G x (0,1), 

{nbnhn)^^^[fn{u,t) - E{fn{u,t)}] ^ N[0, + a{\i,t)‘^}f Xr], 

where Xr = Xr^Xr^ . (ii) If hn^O and nh'^ — >■ oo, then for any u G 


{nhif/^[fn{u) - E{fn{u)}] => N 


0,Xr. 


Jo 


+ o-(u,t) }fiu,t)dt 


Proof. Write 

fn{u,t) - E{tn{u,t)} = Mn{u,t) + Nn{u,t), 

where 

n 

Mniu,t) = '^[yiKs,hn{^ - Xj) - E{yiKs,hr,{^ - ^i)\^i-i}]wb„A't) 

1 = 1 

has summands of martingale differences, and 

n 

Nn{u, t) = 'Y^lEiytKs^hr, (u - Xi) |:Fj_i} - E{yiKs,hn (u - Xi)}]wb„,i{^) 
i=l 

is the remaining term. Let ai^ni^A) = "nCu, by Lemma 6.1, 

||A^n(u,t)|| < [ Ks{s)\\Qa{u-hnS,t)\\ds = 0{{nbn)~^^‘^}. 
J[-iXV 

We apply the martingale central limit theorem on M„(u, t) to show (i). Since 

n 

i=l 

n 

< ‘^^\\yiKs,hAu - Xi)\\Pwb^4t)P = 0{{nbnhY~P}, 

i=l 

the Lindeberg condition is satisfied by observing that p> 2. Let 

n 

Ln{s,t) = ^{m(s,z/n)^ + cr(s,z/n)^}{/i(s,i/n|:Fi_i) - f{s,i/n)}wb^,i{t)'^. 
i=l 
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Then by (Al) and Lemma 6.1, 

n 

hi'Y^[E{y‘fKs,hS^ - - E{y‘fKs,hS^ - 

i=l 

= [ Ks{sfLn{u-hnS,t)ds = Op{{nbn)~^^‘^}- 
Jl-lAjd- 

Also, write E{yiKs,h„{'^ - Xj)|:Fj_i} = - hns)Ks{s) x 

/i(u — hnS,i/n\^i-i) ds. Then we have 

n 

{nhnhi)^\\E{yiKs,hS'^ - = 0 {hi), 

i=l 

and (i) follows by (n6„/i^) X]r=i- Xi)2}u;b„^i(t)2 = {77i(u, + 

a{u,t)‘^}f{u,t)XKgXKT + o(l). Case (ii) can be similarly proved. □ 


Proofs of Theorems 3.1 and 3.2. Letting m = 1 and cr = 0 in 
Lemma 6.2, (3.1) and (3.3) follow directly. For (3.2), write 

f„(u, t) - fn{u, = 4 + /4, 

E{fn{u,t)} 

where 


In 


[fn{u,t) - E{fn{u,t)}] 


m(u, t) 


and 


F;{T;t(u,t)} 

E{fniu,t)}_ 


Opiinbnh'^) 


Iln = {fn{u,t) - m{u,t)fn{u,t)} - E{fn{u,t) - m(u, t)4(u, t)}. 


Note that 

n 

fn{u,t) - = '^{yi - m{u,t)}Ks,h„{u - 

i=l 

by Lemma 6.2(i), 

{nbnhif^'^IIn A^{0, o-(u, t)^/(u, t)AxsA/<j,}. 

Since /„(u,t) —>-/(u,t) in probability, (3.2) follows by Slutsky’s theorem. 
Case (3.4) can be similarly proved. □ 


Proofs of Theorems 3.3 and 3.4. We shall first prove Theorem 3.3(i). 
For this, since sup^gjQ p ||G(t; "^0)4 < oOj we have maxi<j<„ |xj| = Op(n^/^ ) 
for any r' < r. Hence, sup^gjo^i] sup|^j|^^i/,.'/^(u, t) = 0 almost surely, and 
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suPtg[o,i] sup|u|>„i/r' E{fn{u,t)} = 0{n = o{{nbnhi) Therefore, 

it suffices to deal with the case in which |u| < . We shall here assume 

that d = 1. Cases with higher dimensions can be similarly proved without 
extra essential difficulties, but they aew technically tedious. Let 


( 6 . 1 ) 


i=\ 
n 

i=l 


JKsis)fi{u-hnS,i/n\J^i-i)ds. 


Observe that Ks,hA^ - :sii)wb„,iit) - E{Ks,hA^ - :x:i)wb„,iit)\J^i-i}, i = 
1,..., n, form a sequence of bounded martingale differences. By the inequal¬ 
ity of Freedman (1975) and the proof of Theorem 2 in Wu, Huang and Huang 
(2010), we obtain that, for some large constant A > 0, 

pr| sup sup |/„(u,t) -/°(u,t)| > A(n5„/i„)“^/^(logn)^/^| = o(n“^). 


Let i?i(u) = fi{u,i/n\J^i-i) - /(u,i/n) and 0zj(u) = '!9i(u). By (6.1) 

and the proof of Lemma 5.3 in Zhang and Wu (2012), it suffices to show 
that for all I, 

(6.2) prj max sup |0ij(u)| > (/i“^re6„logn)^'^^| = o(6„). 

I0<j<nbn |u|<„l/r' J 


Let A = {nbnhn) ^/^(logn)^/"^ and [uJa = A[u/AJ. By Theorem 2(ii) in 
Liu, Xiao and Wu (2013), under condition (A4), 


(6.3) 


pr| max sup |0ij([ujA)| > (h„^n6„logn)^/^ 

_ ^ [ nbnA-^n^/'"' \ 

1 {hn^nbn logn)9/2 / 


By (A3), maxo<j<nb„ sup|^|<^i/,/ |0/j(u)-0zj([uJ a)| = 0(n6„A), (6.2) fol¬ 
lows. For Theorem 3.3(ii), by Lemma 6.3, supjgjg,!] suPug jr |T„(u, t) — 
E{fn{u,t)}\ = Op{(re6„/i^)"^/2(logn)^/2 (n6„/i^)-i(re^/Plogre)}. Since 


/n(u,t) 


rhn(u,t) 


E{fn{u,t)y 

E{fn{u,t)}_ 


= fn{u, t) - E{tn{u, t)} -L E{fn{u, t)} 



1 

E{fn{u,t )}\’ 









22 


T. ZHANG AND W. B. WU 


the result follows. Theorem 3.4 can be similarly proved. □ 

Recall that ^ G is a compact set. Lemma 6.3 provides uniform bounds 
for 

n 

U (u, t)='^ m{^i,i/n)Ks^h„ (u - 

i=l 

n 

y (u, (u - Xi)wb„,i(t); 

i=l 

n 

U{u) = n~^'^m{xi,i/n)Ks,hr,{^ - ^i)-, 

i=l 

n 

V (u) = n~^ ^ a{xi,iln)r]iKs,hr ,(u - Xj), 
i=l 

and is useful in proving Theorems 3.3 and 3.4. 

Lemma 6.3. Assume (Al), (A3), (A4), r/j G CP for some p > 2, i = 
1,... ,n, —>■ 0 and hn 0. Let Xn = n^/^logn. (i) If nbnh'^ —>■ oo and 

fi‘^+d-qi)^-^ffLd-+<i) ^ t/ien 

(6.4) sup sup |C/(u,t)| = Op{(n6„/i^)“^/^(logn)^/^}, 
tG[o,i] ueS" 

(6.5) sup sup |R(u,t)| = Op{{nbnhi)~^^^{logn)^^‘^ + {nbnhi)~^Xn}■ 
tg[o,i] ue.r 

(ii) If nh'^ —7- oo and 0, then 

(6.6) sup \U{u)\ = Op{(n/i^)"^/^(logn)^/2}, 

uGX' 

(6.7) sup |R(u)| = Op{(n/i^)"^/^(logn)^/^ + (n/i^)“^Xn}. 

Proof. The proof of (6.4) is similar to that of Theorem 3.3(i), and 
we shall only outline the key differences. First, the supreme in (6.4) is 
taken over u G a compact set, instead of M'^. Hence the truncation ar¬ 
gument is no longer needed, and the term in (6.3) can be replaced 

by A“^. Second, E{m{xi,i/n)Ks,hr^{u - Xi)\J^i^i} = Ks{s)fl{u - 

hnS,i/n\J^i_i)ds, where /|(u,t|:Fj_i) = m(u,t)/i(u,t|:Fi_i). By (Al), fl 
satisfies condition (A3), and its predictive dependence measure is of order 

(2.5). Hence the proof of Theorem 3.3(i) applies. Case (6.6) can be simi¬ 
larly handled. For (6.5) and (6.7), we shall only provide the proof of (6.7) 


TIME-VARYING NONLINEAR REGRESSION MODELS 


23 


since (6.5) can be similarly derived. Let rj* = Vi^{\rii\<n^/p} y*(u) be the 

counterpart of F(u) with rji therein replaced by 77*, i = l,...,n. Also, let 
r/l = 77* — E{ri*), and we can similarly define ^^(u). Since iji G are i.i.d., 
we have maxi<j<„ |77j| = and pr{y(u) = y*(u) for all u G 1. 

In addition, 

n 

V*(u) - V\\\)=n-^E{ri*)^a{:>^i,i/n)Ks,hr,{'^-^i)- 

i=l 

Since E{r]i) = 0, we have E{t]*) = -L;(?7il||^.|^^i/p|) = 0(n^/P“^), and by 
(6.6), it suffices to show that (6.7) holds with t^^(u). Let ^ = {u G : |u — 
v| < 1 for some v G c^}, ck = supvg[_]^ |A5(v)|, ci = var(r7*) and C2 = 
supjg[o^i] supug_^'cr(u, t)^ < 00 under (Al). Recall cq from (A3), then 
|(j(xi,7/n)?7jiLs,/i^(u-Xj)| <2cl^^CKn^/Ph~^ and 

£’{o-(xi, z/n)^(77j fKs,hr ,(u - x*)^|:Fi_i} < h~'^coCiC2XKs■ 

Let Wn = (n/i(()“^/^(logn)^/^ + (n/i(()“^(n^/^logn). Applying the inequality 
of Freedman (1975) to R^(u), we obtain that, for some large constant A > 0, 

pr{|R^(u)| > Xwn} 

f x^w'i 

< 2 exp (--T-- 

V 4 C 2 ' CKXn^/P~^hn'^'CUn + ‘2coCiC2XKgn~^hn'^ 

and (6.7) follows by the discretization argument as in (6.3). □ 



Let Un = {nbnh'^)~^ logn + 6^ + Lemmas 6.4-6.7 provide asymptotic 
properties of the restricted residual sum of squares for models I-IV, respec¬ 
tively, and are useful in proving Theorem 3.5. We shall here only provide the 
proof of Lemmas 6.4 and 6.5, which relate to nonparametric kernel estima¬ 
tion of nonlinear regression functions that have been studied in Sections 3.1 
and 3.2. Lemmas 6.6 and 6.7 relate to linear models with time-varying and 
time-constant coefficients, and the proof is available in the supplementary 
material [Zhang and Wu (2015)]. 


Lemma 6.4. Assume (Al), (A3)~(A5), rji G for some p > 2, i = 
1,... ,n, bn^O, hn —>■ 0 and nbnh^/{logn)^ —> 00. If —>■ 0 

and ^ 0, then 


n RSSn,(<^, iT, I) — n ^ ^ A Op 


bn T hfi 1 

(n/i^)V2j- 
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Proof. Note that one can have the decomposition 

n"^RSSn(^,^,I) = n"^ l { x , g ^} e 2 + 4-2//„, 

i&J^n 

where /„ = n"^ i/n)-m(xi, i/n)}2l{xiG.r} = Op{uJn) by The¬ 

orem 3.3, and 


IIn = n ^ ^ {m„(xj,i/n) - m(xi,i/n)}l{x.g^}ei. 
i&J’n 

We shall now deal with the term //„. By Lemma 6.3(i) and Theorem 3.3, 
suPtG^^ supug^ |{/n(u,t) - /(u,t)}{m„(u,t) -m(u,t)}| = Op{uJn) and thus 


sup sup 


mn(u, t) — m(u, t) — 




- Opiuiyi) ■ 


Let = {m{xj,j/n) — m(xj,i/n)}, and we can then write 

IIn — IIn,L T IIn,Q T Opi^^n) ^ 

where 

„ _i Y- ^i=i - ^j)wb„,j{i/n) 

IIn,L = n > - - -—w-l{xiG.r}ei 


and 


i&J^n 


IIn,Q=n ^ Y Y1 

i = l 


i^5,fen(x—Xj)u;fc„j(i/n)]I{^.g^} 

f{y.i,i/n) 


e,;e 


JC.J. 


Using the orthogonality of martingale differences and Lemma 2 of Wu, 
Huang and Huang (2010), we have IIn,L = Op{{nh'^)~^/'^{bn + hn)}- Also, by 
splitting the sum in IIn,Q for cases with i = j and j, one can have IIn,Q = 
Op{{nbn)~^ + Lemma 6.4 follows by = 

o{{bnhi)-^}. □ 


Lemma 6.5. Assume (Al), (A3)“(A5), iji G O' for some p > 2, i = 
hn^O and nh^ —)> oo. If 0 and —?> 0, 

then (i) 



n ^RSSn(=^, H) = / / {m(u,t) — m(u)} f{u,t)dtdu 


X Jsr 


+ n-i^e?l,,.e*.|+Op|(i^) 




1/2 


+ hi 
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(ii) If in addition model II is correctly specified, then 


n ^RSSn(^,^,II) = n ^ ^ e2l{^,g^} + Op| 


J logn 




nhi + + 


1 

(n/i^)V2 J • 


Proof. By Theorem 3.4, 

RSSn(=^, 5^, II) = ^ [{Vi - m(xi)} - {/In(xi) - 

= In + Op[n{{nhi)~^/‘^{\ognf/‘^ + hi}], 

where by Lemma 2 in Wu, Huang and Huang (2010), 

4 = ^ [{Vi - m(xi,i/n)} -L {m(xi,i/n) - m(xi)}]^l|x,g^} 

i&J’n i&J^n 

Since IX’ G is a compact set, by the proof of Lemma 6.2, we have 
{m{xi,i/n) - ?fi(xi)}^l{x.6^} 

iSJ^n 


= Y -"i(xi)}^l{xie.r}]+ Op(n^/^) 

= n I I {m{u,t) — fh{u)}‘^ f{u,t) dtdu +0{1 + 
Jar Jar 


and (i) follows. Case (ii) follows by a similar argument as in Lemma 6.4. □ 


Lemma 6.6. Assume (Al)-(A3), (A6), (P2) and rji G CP for some p> 2, 
i = 1,... ,n. If bn^O and nbn -a- oo, then (i) 


n ^RSSn(c^, =^,111) = 



{m(u,t) — (3n{t)}‘^ f {u,t) dt du 

'arJar 

+ n-^ Y e?l{xiG.r}+ Op((?i>n). 


(ii) If in addition model HI is correctly specified, then 


n 


-^RSS„(^,^,HI) = n ^ + 
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Lemma 6.7. Assume (Al)-(A3), (A6), (PI) and rji G for some p> 2, 
i = 1,... ,n. Then (i) 

n“^RSSn(=^, J^,IV) = / / {m{u,t) — vJOn}"^f{u,t) dtdu 

Jx Jsr 

+ n-^ + Op(re"^/2). 

(ii) If in addition model IV is correctly specified, then 

n"^RSS„(jr,^,IV) =n"^ Y ei^{y<_iear} + Op{n~^). 

Proof of Theorem 3.5. For model I, the AMSE optimal bandwidths 
satisfy 6n(I) and hn(I) x By Lemma 6.4, we have 

log{RSS„(.^,.^,I)/n} = logrn"^ Y e?l{xiG.r}) + 

Under the stated conditions on the tuning parameter, we have 77,-7/(2d+io) _ 
o{Tn{bnh'lf)~^}, and thus the estimation error is dominated by TnDF(I) which 
goes to zero as n —)■ oo. By Lemmas 6.5-6.7, similar results can be derived 
for models II-IV. Note that 

T„max{DF(I),DF(II),DF(III),DF(IV)} = o(l), 

which will be dominated by any model misspecification. The result follows 
by df(IV) < min{DF(II),DF(III)} < max{DF(II), df(III)} < df(I). □ 
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SUPPLEMENTARY MATERIAL 

Additional technical proofs (DOI: 10.1214/14-AOS1299SUPP; .pdf). This 
supplement contains technical proofs of Lemmas 6.6 and 6.7, Proposition 3.1 
and Theorems 3.6 and 3.7. 
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